Alibaba Group
Abstract:Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.
Abstract:Multicast transmission in millimeter-wave (mmWave) networks is fundamentally limited by the weakest user, and blockages further exacerbate this problem. Large-scale reconfigurable intelligent surfaces (XL-RIS) offer a promising solution by providing high array gain to overcome blockages. However, the large aperture of XL-RIS significantly expands the near-field region, creating a hybrid-field scenario where some users lie in the near-field while others remain in the far-field. Existing hybrid-field studies on XL-RIS have primarily focused on channel estimation and deployment optimization, leaving multicast capacity analysis unexplored. This paper investigates the fundamental capacity limits of XL-RIS-assisted multicast communications in hybrid-field scenarios. For the fundamental two-user case consisting of one near-field and one far-field user, we derive the optimal closed-form covariance matrix and optimize the RIS phase shifts via manifold optimization. We establish that the multicast capacity scales as $Θ(\log_2(MN))$ as the number of transmit antennas M and/or RIS elements N grow large, and prove this scaling is order-tight. Numerical results validate the bounds and show the impact of M, $N$, and distance on the multicast rate.
Abstract:Joint unicast and multi-group multicast transmission with RIS and RSMA is a promising enabler for 6G services. However, existing RSMA schemes for such scenarios split only unicast messages while leaving multicast messages intact, limiting the degree of freedom of interference management. To this end, we propose a joint rate splitting framework that splits both unicast and multicast information and two RSMA schemes. The common-common fusion (CCF-RSMA) scheme encodes the unicast common part into the global multicast common stream, while the private-common fusion (PCF-RSMA) scheme merges it with the group-specific multicast private part. For each scheme, we formulate energy efficiency (EE) maximization problems under both perfect and imperfect channel state information, and jointly optimize active beamforming, RIS phase shifts and rate allocation parameters. Simulation results demonstrate that the proposed schemes significantly outperform the comparative schemes in terms of EE, thereby proving the effectiveness of the proposed framework. Moreover, CCF-RSMA is more favorable in scenarios with larger groups and higher unicast QoS demands, whereas PCF-RSMA is better suited for scenarios with smaller groups and higher multicast QoS.
Abstract:Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on model calibration. We propose FineVerify, a fine-grained self-verification framework that decomposes each question into checkable sub-questions, verifies sampled candidates against each sub-question, and selects the candidate with the highest aggregated score. This per-check structure turns selection into simpler local judgments and produces scores under the same explicit criteria. Across four agentic search benchmarks and two models, FineVerify consistently outperforms standard scaling baselines. With only four sampled trajectories, it improves GPT-5-mini by 8.2 accuracy points and Gemini-3-flash by 5.6% on average. With 12 samples, FineVerify enables GPT-5-mini to surpass frontier GPT-5 on BrowseComp-Plus. Beyond accuracy, FineVerify produces interpretable verification traces that help audit benchmark errors, suggesting broader applications for inspecting agentic search systems. Code and data are available at https://github.com/XuZhao0/fineverify
Abstract:Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing methods rely on fixed, hand-crafted adaptation policies rather than optimizing them for downstream improvement. We argue that optimal adaptation policies should be learned from task environments, not hand-engineered based on human intuition. To achieve this, we introduce Meta-TTL, a framework that formulates the discovery of effective adaptation policies as a bi-level optimization problem. Within this framework, the inner loop executes the standard TTL process, measuring how effectively a candidate adaptation policy helps an agent correct errors across sequential episodes. Guided by the agent's performance, the outer loop employs evolutionary search over a diverse distribution of training tasks to iteratively refine the adaptation policy. We evaluate Meta-TTL on Jericho and WebArena-Lite across both in-distribution (ID) and out-of-distribution (OOD) settings, using multiple meta-agent backbones. Results on both benchmarks show that Meta-TTL consistently outperforms hand-crafted baselines, suggesting that the optimized adaptation policy encodes transferable strategies that generalize beyond the training task distribution.
Abstract:As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in some states, but they cannot adapt as the usefulness and reliability of the accumulated context evolve during long-horizon search. To formalize this challenge, we introduce a probabilistic framework that characterizes long-horizon success through two complementary dimensions: search efficiency and terminal precision. Building on this perspective, we propose AgentSwing, a state-aware adaptive parallel context management routing framework. At each trigger point, AgentSwing expands multiple context-managed branches in parallel and uses lookahead routing to select the most promising continuation. Experiments across diverse benchmarks and agent backbones show that AgentSwing consistently outperforms strong static context management methods, often matching or exceeding their performance with up to $3\times$ fewer interaction turns while also improving the ultimate performance ceiling of long-horizon web agents. Beyond the empirical gains, the proposed probabilistic framework provides a principled lens for analyzing and designing future context management strategies for long-horizon agents.
Abstract:Reconfigurable antennas (RAs) utilize the electromagnetic (EM) domain to provide dynamic control over antenna radiation patterns, which offers an effective way to enhance power efficiency in wireless links. Unlike conventional arrays with fixed element patterns, RAs enable on-demand beam-pattern synthesis by directly controlling each antenna's EM characteristics. While existing research on RAs has primarily focused on improving spectral efficiency, this paper explores their application for downlink localization. Moreover, the majority of existing works focus on far-field scenarios with little attention on near-field (NF). Motivated by these gaps, we consider a synthesis model in which each antenna generates desired beampatterns from a finite set of EM basis functions. We then formulate a joint optimization problem for the baseband (BB) and EM precoders with the objective of minimizing the user equipment (UE) position error bound (PEB) in NF conditions. Our analytical derivations and extensive simulation results demonstrate that the proposed hybrid precoder design for RAs significantly improves UE positioning accuracy compared to traditional non-reconfigurable arrays.
Abstract:Current benchmarks for code agents primarily assess narrow, repository-specific fixes, overlooking critical real-world challenges such as cross-repository reasoning, domain-specialized problem solving, dependency-driven migration, and full-repository generation. To address this gap, we introduce BeyondSWE, a comprehensive benchmark that broadens existing evaluations along two axes - resolution scope and knowledge scope - using 500 real-world instances across four distinct settings. Experimental results reveal a significant capability gap: even frontier models plateau below 45% success, and no single model performs consistently across task types. To systematically investigate the role of external knowledge, we develop SearchSWE, a framework that integrates deep search with coding abilities. Our experiments show that search augmentation yields inconsistent gains and can in some cases degrade performance, highlighting the difficulty of emulating developer-like workflows that interleave search and reasoning during coding tasks. This work offers both a realistic, challenging evaluation benchmark and a flexible framework to advance research toward more capable code agents.
Abstract:Integrated sensing and communication (ISAC) is a key technology for enabling a wide range of applications in future wireless systems. However, the sensing performance is often degraded by model mismatches caused by geometric errors (e.g., position and orientation) and hardware impairments (e.g., mutual coupling and amplifier non-linearity). This paper focuses on the angle estimation performance with antenna arrays and tackles the critical challenge of array beam pattern calibration for ISAC systems. To assess calibration quality from a sensing perspective, a novel performance metric that accounts for angle estimation error, rather than beam pattern similarity, is proposed and incorporated into a differentiable loss function. Additionally, a cooperative calibration framework is introduced, allowing multiple user equipments to iteratively optimize the beam pattern based on the proposed loss functions and local data, and collaboratively update global calibration parameters. The proposed models and algorithms are validated using real-world beam pattern measurements collected in an anechoic chamber. Experimental results show that the angle estimation error can be reduced from {$\textbf{1.01}^\circ$} to $\textbf{0.11}^\circ$ in 2D calibration scenarios, and from $\textbf{5.19}^\circ$ to $\textbf{0.86}^\circ$ in 3D calibration ones.
Abstract:The rapid advancement of connected and autonomous vehicles has driven a growing demand for precise and reliable positioning systems capable of operating in complex environments. Meeting these demands requires an integrated approach that combines multiple positioning technologies, including wireless-based systems, perception-based technologies, and motion-based sensors. This paper presents a comprehensive survey of wireless-based positioning for vehicular applications, with a focus on satellite-based positioning (such as global navigation satellite systems (GNSS) and low-Earth-orbit (LEO) satellites), cellular-based positioning (5G and beyond), and IEEE-based technologies (including Wi-Fi, ultrawideband (UWB), Bluetooth, and vehicle-to-vehicle (V2V) communications). First, the survey reviews a wide range of vehicular positioning use cases, outlining their specific performance requirements. Next, it explores the historical development, standardization, and evolution of each wireless positioning technology, providing an in-depth categorization of existing positioning solutions and algorithms, and identifying open challenges and contemporary trends. Finally, the paper examines sensor fusion techniques that integrate these wireless systems with onboard perception and motion sensors to enhance positioning accuracy and resilience in real-world conditions. This survey thus offers a holistic perspective on the historical foundations, current advancements, and future directions of wireless-based positioning for vehicular applications, addressing a critical gap in the literature.